RSPEED-2326: feat(rlsapi): integrate Splunk telemetry into v1 /infer endpoint #1050

major · 2026-01-23T16:59:00Z

Description

Integrates Splunk HEC telemetry into the rlsapi v1 /infer endpoint. This is the final PR in the Splunk integration series (building on #1031 and #1032).

Changes:

Add _get_rh_identity_context() helper to extract org_id/system_id from request.state
Add _queue_splunk_event() to build and queue telemetry events via FastAPI BackgroundTasks
Add timing measurement around inference calls
Queue infer_with_llm events on success, infer_error events on failure
Add user-facing documentation (docs/splunk.md)
Add developer documentation (src/observability/README.md)

Events are sent asynchronously and never block or affect the main request flow.

Type of change

New feature
Documentation Update
Unit tests improvement
Integration tests improvement

Tools used to create PR

Assisted-by: Claude (Anthropic)
Generated by: N/A

Related Tickets & Documents

Related Issue # RSPEED-2326
Closes # N/A (part of multi-PR implementation)

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Unit tests: uv run pytest tests/unit/app/endpoints/test_rlsapi_v1.py -v (24 tests)
Integration tests: uv run pytest tests/integration/endpoints/test_rlsapi_v1_integration.py -v (11 tests)
Observability tests: uv run pytest tests/unit/observability/ -v (11 tests)

New tests verify:

RH Identity context extraction (with/without identity data, empty values)
Splunk event queuing on successful inference
Splunk error event queuing on failed inference
Event payload includes correct org_id/system_id from RH Identity

Summary by CodeRabbit

New Features
- Splunk HEC integration for asynchronous inference telemetry, including event timing and identity context; telemetry is sent for successes and error paths.
Documentation
- Added Splunk HEC configuration guide and comprehensive Observability module docs covering event formats, troubleshooting, and extension points.
Tests
- Updated/added integration and unit tests to cover telemetry behavior, identity context extraction, and background task handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-23T16:59:25Z

Walkthrough

Adds Splunk HEC telemetry emission: the inference endpoint now accepts FastAPI Request and BackgroundTasks, extracts RH Identity context (org_id, system_id) with fallback, measures inference time, and queues asynchronous Splunk events for successes and specific error paths. Documentation and tests updated.

Changes

Cohort / File(s)	Summary
Documentation `docs/splunk.md`, `src/observability/README.md`	New Splunk HEC integration docs and Observability README covering config, token handling, event/envelope formats, endpoints that emit events, graceful degradation, and developer extension guidance.
Endpoint & Telemetry `src/app/endpoints/rlsapi_v1.py`	Extended `infer_endpoint` signature to accept `Request` and `BackgroundTasks`. Added `AUTH_DISABLED`, `_get_rh_identity_context`, `_queue_splunk_event`, CLA version extraction, and timing instrumentation. Queues Splunk events on success and on API errors (APIConnectionError, RateLimitError, APIStatusError).
Integration Tests `tests/integration/endpoints/test_rlsapi_v1_integration.py`	Added helpers/fixtures to create mock `Request` and `BackgroundTasks`; updated integration tests to pass mocks into `infer_endpoint` across scenarios.
Unit Tests `tests/unit/app/endpoints/test_rlsapi_v1.py`	Exported and tested `AUTH_DISABLED` and `_get_rh_identity_context`. Added mock request/background helpers, refactored tests to pass mocks, and added tests validating RH Identity extraction and Splunk event queuing on success/failure.

Sequence Diagram

sequenceDiagram
    participant Client
    participant InferEndpoint as "infer_endpoint\n(Request, BackgroundTasks)"
    participant RHContext as "RH Identity\nContext"
    participant Processor as "Inference\nProcessing"
    participant BackgroundTasks as "BackgroundTasks"
    participant SplunkHEC as "Splunk HEC"

    Client->>InferEndpoint: POST /rlsapi/v1/infer
    InferEndpoint->>RHContext: extract org_id, system_id
    RHContext-->>InferEndpoint: context or AUTH_DISABLED
    InferEndpoint->>InferEndpoint: start inference timer
    InferEndpoint->>Processor: run inference
    alt Success
        Processor-->>InferEndpoint: response_text
        InferEndpoint->>InferEndpoint: compute inference_time
        InferEndpoint->>BackgroundTasks: add_task(_queue_splunk_event, success payload + RH context)
    else API/RateLimit/Status Error
        Processor-->>InferEndpoint: raises error
        InferEndpoint->>InferEndpoint: compute inference_time
        InferEndpoint->>BackgroundTasks: add_task(_queue_splunk_event, error payload + RH context)
    end
    InferEndpoint-->>Client: response or HTTP error
    BackgroundTasks->>SplunkHEC: async send event
    SplunkHEC-->>BackgroundTasks: acknowledgement

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

feat(rlsapi): add v1 /infer endpoint module #902 — Prior work on the /infer endpoint implementation; directly related to added telemetry/background task dispatch.
RSPEED-2229: Add rlsapi v1 endpoint registration and CLA documentation #928 — Changes to rlsapi v1 inference surface and tests; overlaps with signature and test updates here.
feat(rlsapi): implement LLM integration for v1 /infer endpoint #916 — Modifications to infer_endpoint and LLM handling that intersect with the telemetry and error-path changes.

Suggested labels

ok-to-test

Suggested reviewers

tisnik

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: integrating Splunk telemetry into the v1 /infer endpoint, which is accurately reflected across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…endpoint - Add _get_rh_identity_context() to extract org_id/system_id from request.state - Add _queue_splunk_event() to build and queue telemetry events via BackgroundTasks - Add timing measurement around inference calls - Queue infer_with_llm events on success, infer_error on failure - Add unit tests for RH Identity context extraction and Splunk integration - Update integration tests for new endpoint signature - Add user-facing docs (docs/splunk.md) and developer docs (src/observability/README.md) Signed-off-by: Major Hayden <major@redhat.com>

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@docs/splunk.md`:
- Around line 139-142: The fenced log snippet in docs/splunk.md is missing a
language tag which triggers markdownlint MD040; update the code fence
surrounding the log line "Splunk HEC request failed with status 403: Invalid
token" from ``` to ```text so the block is explicitly marked as plain text and
renders/lints correctly.

In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 262-317: The except branches currently call
_queue_splunk_event(...) then raise HTTPException(...), which drops FastAPI
background tasks and loses telemetry; instead, after recording inference_time,
metrics, logging, and calling _queue_splunk_event(...), construct the
appropriate response model (ServiceUnavailableResponse, QuotaExceededResponse,
or InternalServerErrorResponse.generic()), convert it to a dict via
model_dump(), and return a fastapi.responses.JSONResponse with that body, the
correct status_code, and background=background_tasks so the queued telemetry
runs; update the handlers for APIConnectionError, RateLimitError, and
APIStatusError (the except blocks around retrieve_simple_response) to return
JSONResponse(...) rather than raising HTTPException.

In `@tests/integration/endpoints/test_rlsapi_v1_integration.py`:
- Around line 40-46: In _create_mock_request replace the deletion of the
non-existent attribute (del mock_request.state.rh_identity_data) with explicitly
setting mock_request.state.rh_identity_data = None so it won't raise
AttributeError; update the mock setup in the _create_mock_request function to
assign None instead of deleting the attribute to reflect absent identity data.

In `@tests/unit/app/endpoints/test_rlsapi_v1.py`:
- Around line 40-50: The helper _create_mock_request should avoid
unconditionally deleting mock_request.state.rh_identity_data because del will
raise AttributeError if the attribute doesn't exist; instead, check for
existence (e.g., hasattr/getattr) and only delete or set to None when present.
Update the _create_mock_request implementation to conditionally remove or clear
mock_request.state.rh_identity_data using a safe existence check so tests won’t
break when the attribute was never set.

🧹 Nitpick comments (1)

src/app/endpoints/rlsapi_v1.py (1)

182-196: Bring helper docstrings up to Google style.

_get_cla_version and _queue_splunk_event need full Args/Returns sections to match module conventions.

✍️ Suggested docstring shape

-def _get_cla_version(request: Request) -> str:
-    """Extract CLA version from User-Agent header."""
+def _get_cla_version(request: Request) -> str:
+    """Extract CLA version from the User-Agent header.
+
+    Args:
+        request: The FastAPI request object.
+
+    Returns:
+        The User-Agent header value or an empty string.
+    """
@@
 def _queue_splunk_event(
@@
 ) -> None:
-    """Build and queue a Splunk telemetry event for background sending."""
+    """Build and queue a Splunk telemetry event for background sending.
+
+    Args:
+        background_tasks: FastAPI background tasks queue.
+        infer_request: Inference request payload.
+        request: FastAPI request for headers/state.
+        request_id: Unique request identifier.
+        response_text: LLM response or error message.
+        inference_time: Total inference duration in seconds.
+        sourcetype: Splunk sourcetype (e.g., infer_with_llm).
+    """

As per coding guidelines, please keep Google-style docstrings for all functions.

coderabbitai · 2026-01-26T17:52:28Z

docs/splunk.md

+4. Check service logs for warning messages:
+   ```
+   Splunk HEC request failed with status 403: Invalid token
+   ```


⚠️ Potential issue | 🟡 Minor

Add a language to the fenced log snippet.

This avoids markdownlint MD040 and improves rendering.

🔧 Suggested fix

- ``` - Splunk HEC request failed with status 403: Invalid token - ``` + ```text + Splunk HEC request failed with status 403: Invalid token + ```

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

140-140: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@docs/splunk.md` around lines 139 - 142, The fenced log snippet in docs/splunk.md is missing a language tag which triggers markdownlint MD040; update the code fence surrounding the log line "Splunk HEC request failed with status 403: Invalid token" from ``` to ```text so the block is explicitly marked as plain text and renders/lints correctly.

coderabbitai · 2026-01-26T17:52:28Z

src/app/endpoints/rlsapi_v1.py

+    start_time = time.monotonic()
    try:
        response_text = await retrieve_simple_response(input_source, instructions)
+        inference_time = time.monotonic() - start_time
    except APIConnectionError as e:
+        inference_time = time.monotonic() - start_time
        metrics.llm_calls_failures_total.inc()
        logger.error(
            "Unable to connect to Llama Stack for request %s: %s", request_id, e
        )
+        _queue_splunk_event(
+            background_tasks,
+            infer_request,
+            request,
+            request_id,
+            str(e),
+            inference_time,
+            "infer_error",
+        )
        response = ServiceUnavailableResponse(
            backend_name="Llama Stack",
            cause=str(e),
        )
        raise HTTPException(**response.model_dump()) from e
    except RateLimitError as e:
+        inference_time = time.monotonic() - start_time
        metrics.llm_calls_failures_total.inc()
        logger.error("Rate limit exceeded for request %s: %s", request_id, e)
+        _queue_splunk_event(
+            background_tasks,
+            infer_request,
+            request,
+            request_id,
+            str(e),
+            inference_time,
+            "infer_error",
+        )
        response = QuotaExceededResponse(
            response="The quota has been exceeded", cause=str(e)
        )
        raise HTTPException(**response.model_dump()) from e
    except APIStatusError as e:
+        inference_time = time.monotonic() - start_time
        metrics.llm_calls_failures_total.inc()
        logger.exception("API error for request %s: %s", request_id, e)
+        _queue_splunk_event(
+            background_tasks,
+            infer_request,
+            request,
+            request_id,
+            str(e),
+            inference_time,
+            "infer_error",
+        )
        response = InternalServerErrorResponse.generic()
        raise HTTPException(**response.model_dump()) from e


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

FastAPI BackgroundTasks executed when raising HTTPException

💡 Result:

BackgroundTasks in FastAPI (Starlette) only run if they are attached to the Response that gets sent—they execute after the response is sent. If you raise HTTPException, FastAPI aborts the normal return path and builds a new error response, so the tasks you added in the endpoint won’t be attached to the response and won’t run. [1], [2]

Options

A) Return an error response instead of raising, with a background task attached

from fastapi import FastAPI from starlette.responses import JSONResponse from starlette.background import BackgroundTask app = FastAPI() def log_failure(msg: str): ... `@app.get`("/x") def x(): return JSONResponse( {"detail": "bad"}, status_code=400, background=BackgroundTask(log_failure, "bad"), )

(Background tasks must be attached to the response.) [2]

B) If you must raise HTTPException, attach tasks in a custom exception handler

from fastapi import FastAPI, BackgroundTasks from starlette.exceptions import HTTPException as StarletteHTTPException from starlette.responses import PlainTextResponse app = FastAPI() def log_failure(msg: str): ... `@app.exception_handler`(StarletteHTTPException) async def http_exc_handler(request, exc): tasks = BackgroundTasks() tasks.add_task(log_failure, str(exc.detail)) return PlainTextResponse(str(exc.detail), status_code=exc.status_code, background=tasks)

(Workaround: build the error response yourself and attach tasks there.) [3], [4]

C) If you need “always run”, use try/finally (not BackgroundTasks)
Because BackgroundTasks is specifically “run after a response is sent,” it’s not the right tool for cleanup that must happen even when you abort early. [1]

Sources: FastAPI BackgroundTasks docs [1]; Starlette Background Tasks docs [2]; Stack Overflow workaround using custom exception handler [3]; Starlette exceptions/response behavior [4].

Background tasks will not execute when raising HTTPException—adjust to preserve telemetry.

FastAPI background tasks only execute if attached to the returned Response. Raising HTTPException aborts the normal response path, causing the _queue_splunk_event() background tasks to be dropped and infer_error telemetry to be lost. Return a JSONResponse with the background parameter instead, or implement a custom exception handler to attach tasks to the error response.

Fix pattern (apply to each error branch)

+from fastapi.responses import JSONResponse @@ except APIConnectionError as e: inference_time = time.monotonic() - start_time metrics.llm_calls_failures_total.inc() logger.error( "Unable to connect to Llama Stack for request %s: %s", request_id, e ) _queue_splunk_event( background_tasks, infer_request, request, request_id, str(e), inference_time, "infer_error", ) response = ServiceUnavailableResponse( backend_name="Llama Stack", cause=str(e), ) - raise HTTPException(**response.model_dump()) from e + error_payload = response.model_dump() + return JSONResponse( + status_code=error_payload["status_code"], + content=error_payload["detail"], + background=background_tasks, + )

🤖 Prompt for AI Agents

In `@src/app/endpoints/rlsapi_v1.py` around lines 262 - 317, The except branches currently call _queue_splunk_event(...) then raise HTTPException(...), which drops FastAPI background tasks and loses telemetry; instead, after recording inference_time, metrics, logging, and calling _queue_splunk_event(...), construct the appropriate response model (ServiceUnavailableResponse, QuotaExceededResponse, or InternalServerErrorResponse.generic()), convert it to a dict via model_dump(), and return a fastapi.responses.JSONResponse with that body, the correct status_code, and background=background_tasks so the queued telemetry runs; update the handlers for APIConnectionError, RateLimitError, and APIStatusError (the except blocks around retrieve_simple_response) to return JSONResponse(...) rather than raising HTTPException.

coderabbitai · 2026-01-26T17:52:28Z

tests/integration/endpoints/test_rlsapi_v1_integration.py

+def _create_mock_request(mocker: MockerFixture) -> Any:
+    """Create a mock FastAPI Request with minimal state."""
+    mock_request = mocker.Mock()
+    mock_request.state = mocker.Mock()
+    mock_request.headers = {"User-Agent": "CLA/0.4.0"}
+    del mock_request.state.rh_identity_data
+    return mock_request


⚠️ Potential issue | 🟠 Major

Same mock deletion issue here.

Deleting an attribute that was never set can raise AttributeError. Use an explicit None instead.

🔧 Suggested fix

- del mock_request.state.rh_identity_data + mock_request.state.rh_identity_data = None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _create_mock_request(mocker: MockerFixture) -> Any:

"""Create a mock FastAPI Request with minimal state."""

mock_request = mocker.Mock()

mock_request.state = mocker.Mock()

mock_request.headers = {"User-Agent": "CLA/0.4.0"}

del mock_request.state.rh_identity_data

return mock_request

def _create_mock_request(mocker: MockerFixture) -> Any:

"""Create a mock FastAPI Request with minimal state."""

mock_request = mocker.Mock()

mock_request.state = mocker.Mock()

mock_request.headers = {"User-Agent": "CLA/0.4.0"}

mock_request.state.rh_identity_data = None

return mock_request

🤖 Prompt for AI Agents

In `@tests/integration/endpoints/test_rlsapi_v1_integration.py` around lines 40 - 46, In _create_mock_request replace the deletion of the non-existent attribute (del mock_request.state.rh_identity_data) with explicitly setting mock_request.state.rh_identity_data = None so it won't raise AttributeError; update the mock setup in the _create_mock_request function to assign None instead of deleting the attribute to reflect absent identity data.

coderabbitai · 2026-01-26T17:52:29Z

tests/unit/app/endpoints/test_rlsapi_v1.py

+def _create_mock_request(mocker: MockerFixture, rh_identity: Any = None) -> Any:
+    """Create a mock FastAPI Request with optional RH Identity data."""
+    mock_request = mocker.Mock()
+    mock_request.state = mocker.Mock()
+    mock_request.headers = {"User-Agent": "CLA/0.4.0"}
+
+    if rh_identity is not None:
+        mock_request.state.rh_identity_data = rh_identity
+    else:
+        del mock_request.state.rh_identity_data
+


⚠️ Potential issue | 🟠 Major

Mock attribute deletion can raise AttributeError.

del mock_request.state.rh_identity_data will fail when the attribute hasn’t been created yet. This makes the helper brittle and can break multiple tests.

🔧 Suggested fix

- else: - del mock_request.state.rh_identity_data + else: + mock_request.state.rh_identity_data = None

🤖 Prompt for AI Agents

In `@tests/unit/app/endpoints/test_rlsapi_v1.py` around lines 40 - 50, The helper _create_mock_request should avoid unconditionally deleting mock_request.state.rh_identity_data because del will raise AttributeError if the attribute doesn't exist; instead, check for existence (e.g., hasattr/getattr) and only delete or set to None when present. Update the _create_mock_request implementation to conditionally remove or clear mock_request.state.rh_identity_data using a safe existence check so tests won’t break when the attribute was never set.

major force-pushed the rspeed-2326-splunk-endpoint branch from 0ae1460 to 40b19fa Compare January 26, 2026 17:44

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSPEED-2326: feat(rlsapi): integrate Splunk telemetry into v1 /infer endpoint #1050

RSPEED-2326: feat(rlsapi): integrate Splunk telemetry into v1 /infer endpoint #1050

major commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 26, 2026

Uh oh!

coderabbitai bot Jan 26, 2026

Uh oh!

coderabbitai bot Jan 26, 2026

Uh oh!

coderabbitai bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RSPEED-2326: feat(rlsapi): integrate Splunk telemetry into v1 /infer endpoint #1050

Are you sure you want to change the base?

RSPEED-2326: feat(rlsapi): integrate Splunk telemetry into v1 /infer endpoint #1050

Conversation

major commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Options

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

major commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading